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ABSTRACT 


Intelligence-production  activities  are  typically  viewed  as  part  of  an  intelligence  cycle,  con¬ 
sisting  of  planning,  collection,  processing,  analysis,  and  dissemination  stages.  Once  a 
request  for  information  is  issued,  the  intelligence  agencies  mostly  deal  with  the  collection 
and  processing  activities  of  the  cycle.  However,  in  most  situations,  there  is  an  enormous 
amount  of  data  to  be  collected.  This  overabundance  of  information  requires  methods  that 
select  only  the  useful  data,  to  prevent  intelligence  personnel  from  wasting  time  and  effort 
on  non-relevant  data.  Online  learning  is  an  area  of  research  that  has  gained  attention  in 
recent  years  with  applications  in  areas  such  as  web  advertising,  classification,  and  decision 
making.  In  this  thesis,  we  develop  a  model  aimed  at  the  collection  and  processing  phases 
of  the  intelligence  cycle,  applicable  in  situations  where  the  data  is  obtained  sequentially,  so 
that  learning  algorithms  are  realistic.  We  analyze  the  performance  of  a  modified  Thomp¬ 
son  Sampling  algorithm,  to  help  intelligence  analysts  make  good  decisions,  regarding  the 
sources  from  which  to  collect/process  as  well  as  the  collection/processing  capacity  and 
its  allocation  over  time,  in  order  to  bind  the  risk  of  missing  valuable  information  below  a 
certain  threshold. 


V 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


VI 


Table  of  Contents 


1  Introduction  1 

1 . 1  Introduction .  1 

1 .2  Research  Questions  and  Methodology .  2 

1.3  Scope .  3 

1.4  Structure  of  the  Thesis .  4 

2  Background  and  Literature  Review  5 

2.1  Intelligence .  5 

2.2  Online  Learning .  10 

2.3  Intelligence  and  Online  Learning .  13 

2.4  Related  Studies .  14 

3  Model  15 

3.1  Setting .  15 

3.2  Model .  16 

3.3  Parameter  Estimation  after  the  Data  Is  Available .  19 

4  Analysis  23 

4.1  Algorithm .  24 

4.2  Results .  26 

4.3  Determining  the  Number  of  Sources  to  Sample  {q) .  31 

4.4  Risk  vs.  Resource  Allocated .  39 

4.5  Using  Posteriors  to  Learn  about  the  Distribution  of  .  43 

5  Conclusion  and  Further  Study  47 

5.1  Conclusion .  47 

5.2  Further  Study .  48 


List  of  References 


49 


Initial  Distribution  List 


53 


List  of  Figures 

Figure  2. 1  Intelligence  Cycle .  7 

Figure  2.2  Relationship  between  Data,  Information  and  Intelligence .  8 

Figure  2.3  Pseudocode  of  Exponential- weight  Algorithm  for  Exploration  and 

Exploitation  (Exp3)  Algorithm .  11 

Eigure  4. 1  Pseudocode  of  the  Algorithm  for  Arbitrary  Priors  .  26 

Eigure  4.2  Pdf  of  the  Distribution  of  the  ps .  27 

Eigure  4.3  Cumulative  Regret  for  One  Population .  27 

Eigure  4.4  Non-cumulative  Regret  for  One  Population .  27 

Eigure  4.5  Regret  for  One  Population .  28 

Eigure  4.6  Pdf  1 .  30 

Eigure  4.7  Pdf  2 .  30 

Eigure  4.8  Cumulative  Regret  for  Two  (Mix)  Populations .  30 

Eigure  4.9  Non-cumulative  Regret  for  Two  (Mix)  Populations .  30 

Eigure  4.10  Regret  for  Two  Populations .  31 

Eigure  4. 11  ^=10 .  33 

Eigure  4.12  q  =  20 .  33 

Eigure  4.13  Regret  Obtained  by  Different  ^  Values  .  34 

Eigure  4.14  Eearning  with  Dynamic  ^ .  37 

Eigure  4.15  Change  in  ^  .  38 

Eigure  4.16  Pseudocode  of  the  Dynamic  Algorithm .  39 

Eigure  4.17  Tradeoff  between  Allocated  Resource  per  Time  Period  (q)  and  the 

Risk  .  41 


IX 


Figure  4.18  Risk  according  to  ^  and  Time  Horizon  .  42 

Figure  4.19  Assumed  (True)  Distribution  and  Histogram  of  100  Samples  .  .  .  44 

Figure  4.20  Fitted  Distribution  and  Histogram  of  100  Means  of  Posteriors  .  .  44 

Figure  4.21  Fitted  Mixture  Distributions  at  Different  Time  Periods .  45 


X 


List  of  Tables 


Table  1.1  Examples  for  Sources  and  Capacities  (Resources)  for  Different  Phases  3 

Table  3.1  Examples  for  Sources  and  Capacities  (Resources)  for  Different  Phases  1 5 

Table  4.1  Minimum  ^  Values  Required .  36 

Table  4.2  Estimated  Risk  for  Allocated  Resources  (q)  (S  =  100  and  T  =  300)  42 

Table  4.3  Comparison  of  the  True  Weights  and  Parameters  to  Those  Pitted  .  44 


XI 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


List  of  Acronyms  and  Abbreviations 


CIA 

Central  Intelligence  Agency 

COMINT 

Communications  Intelligence 

CYBINT/DNINT 

Cyber  Intelligence/Digital  Network  Intelligence 

DoD 

Department  of  Defense 

EM 

Expectation  Maximization 

Exp3 

Exponential-weight  Algorithm  for  Exploration  and  Exploitation 

FBI 

Eederal  Bureau  of  Investigation 

FININT 

Einancial  Intelligence 

GEOINT 

Geospatial  Intelligence 

HUMINT 

Human  Intelligence 

IC 

Intelligence  Community 

MAB 

Multi- Armed  Bandit 

MASINT 

Measurement  and  Signature  Intelligence 

NFS 

Naval  Postgraduate  School 

OSINT 

Open  Source  Intelligence 

pdf 

Probability  Density  Eunction 

SIGINT 

Signals  Intelligence 

TECHINT 

Technical  Intelligence 

TS 

Thompson  Sampling 

UCB 

Upper  Confidence  Bound 

xiii 


u.s. 


United  States 


Executive  Summary 


Intelligence  activities  are  part  of  a  framework  called  intelligence  cycle.  The  main  phases  in 
the  cycle  are  planning,  collection,  processing,  analyzing,  and  dissemination  of  information. 
Some  of  the  data  collected  from  the  operational  environment  is  discarded  due  to  irrelevance. 
This  discarded  data  wastes  time  and  effort  for  the  intelligence  organizations. 

As  demonstrated  in  [1],  “The  goal  of  online  learning  is  to  make  a  sequence  of  accurate 
predictions  given  knowledge  of  the  correct  answer  to  previous  prediction  tasks  and  possibly 
additional  available  information.”  Online  learning  is  used  when  learning  with  training  data 
is  infeasible,  or  the  data  is  non-stationary.  It  is  also  used  to  adapt  to  the  changes  in  the 
environment.  Learning  as  one  goes  along  can  be  more  robust  than  specifying  a  model  and 
using  mathematical  optimization  [2].  Although  it  was  first  defined  in  machine-learning 
literature,  its  application  can  be  found  in  other  areas  such  as  optimization,  game  theory, 
and  statistical  modeling. 

The  main  goal  of  the  thesis  is  to  address  the  challenge  of  how  to  efficiently  explore  the  data 
originating  from  a  large  number  of  sources.  For  this  purpose,  we  propose  a  model  that  is 
suitable  for  most  of  the  situations  in  the  collection  and  processing  phases.  In  particular,  we 
modify  the  well-known  Thompson  Sampling  (TS)  algorithm,  originating  in  the  machine- 
learning  community,  to  sample  from  more  than  one  source  at  a  time,  as  is  the  case  in 
intelligence  organizations  that  analyze  large  numbers  of  items  concurrently. 

We  address  the  following  questions: 


•  How  can  we  create  a  balance  between  exploration  and  exploitation  to  maximize  the 
benefit  when  making  decisions  as  to  the  collection,  processing,  or  analysis  of  data  if 
there  are  time  or  resource  constraints? 

•  How  can  we  decide  what  amount  of  resources  should  be  allocated  for  collection, 
processing,  or  analyzing  if  we  can  not  change  the  allocation  in  the  short  term? 

•  How  can  we  adjust  the  allocation  dynamically  while  learning  occurs  online? 

•  How  can  we  quantify  the  risk  of  missing  relevant  data  versus  the  amount  of  resources 
allocated? 


XV 


•  How  can  we  use  the  data  gathered  to  gain  insights  about  the  population  of  the  sources 
that  generated  the  data? 


We  modified  the  TS  algorithm,  so  we  can  explore  arbitrarily  large  number  of  sources, 
limited  only  by  the  resources  of  the  intelligence  organization.  The  measure  of  performance 
is  the  regret:  the  difference  between  the  rewards  obtained  by  the  algorithm  and  the  rewards 
that  could  have  been  obtained  had  the  analyst  known  the  true  nature  of  each  source  of 
intelligence.  The  expected  regret  has  recently  been  shown  to  grow  sublinearly  in  number 
of  time  periods  (T),  so  that  learning  does  indeed  occur.  TS,  in  its  base  form,  leads  to 
learning  on  the  order  of  O(logr),  meaning  that  the  expected  average  regret  goes  to  zero  as 
the  time  horizon  T  increases. 

Our  main  conclusions  and  contributions  can  be  summarized  as  follows: 


•  The  model  described  can  be  used  to  allocate  the  collection/processing  resources/ef¬ 
forts  efficiently. 

•  The  suggested  algorithm  employed  in  the  model  yields  a  sublinear  performance  in 
the  simulations  we  conducted,  meaning  that  the  average  regret  tends  to  zero  as  the 
number  of  time  periods  gets  larger. 

•  The  model  can  be  adapted  to  situations  in  which  prior  knowledge  about  the  sources 
exists. 

•  We  consider  the  capacity  allocated  as  possibly  changing  over  time,  as  information 
becomes  available.  With  this  approach,  intelligence  agencies  can  better  control  the 
regret  in  the  exploration  phase  and  avoid  using  excess  capacity. 

•  The  model  can  also  be  employed  to  gain  insights  about  the  risk  of  missing  relevant 
data,  which  provide  further  guidance  for  the  capacity  required. 

•  We  also  use  the  Expectation  Maximization  (EM)  algorithm  to  estimate  the  distribu¬ 
tional  parameters  for  the  statistical  model  of  the  candidate  subpopulations  as  data  is 
collected. 
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CHAPTER  1: 

Introduction 


Knowledge  is  power  only  if  man  knows  what  facts  not  to  bother  with. 

Robert  Staughton  Lynd  [1] 


1.1  Introduction 

The  amount  of  data  available  to  intelligence  agencies  has  skyrocketed  in  recent  years  in 
parallel  to  technological  advances.  Fully  handling  this  huge  amount  of  data  is  beyond  the 
capabilities  (human  and  technological)  of  any  organization  if  done  in  a  naive  manner. 

The  challenge  is  well  summarized  by  Hedley: 

In  the  twenty-first  century,  a  principal  analytic  challenge  lies  in  the  sheer  vol¬ 
ume  of  information  available.  Although  especially  hard  targets  such  as  terrorist 
cells  are  no  less  difficult  to  penetrate,  the  explosion  of  open-source  informa¬ 
tion  from  news  services  and  the  Worldwide  Web  makes  the  speed  and  volume 
of  reporting  more  difficult  to  sift  through.  Advances  in  information  technol¬ 
ogy  both  help  and  hinder,  as  analysts  strive  to  cope  with  the  “noise,”  the  chaff 
they  must  winnow  away.  Data  multiply  with  dizzying  speed.  Whereas  col¬ 
lecting  solid  intelligence  information  was  the  overriding  problem  of  the  past, 
selecting  and  validating  it  loom  ever  larger  as  problems  for  analysts  today.  [2] 

Intelligence  activities  are  commonly  considered  within  a  framework  called  the  Intelligence 
Cycle.  The  main  phases  in  the  cycle  are  planning,  collection,  processing,  analyzing,  and 
dissemination  of  the  information.  The  detailed  discussion  of  these  stages  is  provided  in 
Chapter  2. 

Some  of  the  data  collected  from  the  operational  environment  is  discarded  because  it  is 
irrelevant  by  the  point  when  final  judgments  are  made  by  analysts.  According  to  Wirtz  [3], 
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“Observers  ritualistieally  point  out  that  analysts  are  constantly  at  risk  of  being  overwhelmed 
by  a  deluge  of  information  from  both  open  and  classified  sources.  Yet,  the  real  danger  may 
be  the  fact  that,  within  this  data  stream,  there  is  little  valuable  information  about  the  highest 
priority  targets  and  issues  facing  analysts.” 

Besides  the  relevancy  of  the  data,  the  quality  of  the  source  that  generates  the  data  has  to  be 
understood  in  order  to  focus  on  the  sources  that  are  more  likely  to  produce  relevant  data. 

Online  learning  is  characterized  by  updating  the  beliefs  about  the  ground  truth,  which  is 
unknown,  as  new  data  becomes  available.  Because  the  uncertainty  is  reduced  as  a  result  of 
explorations,  one  ought  to  adapt  his  decisions  in  light  of  new  information.  This  approach  is 
also  applicable  for  most  of  the  situations  wherein  decisions  are  made  in  sequence.  In  some 
collection  contexts,  data  is  collected  in  a  sequential  manner  that  allows  analysts  to  apply 
online  learning  methods  to  assess  the  quality  of  the  sources.  For  example,  the  sources  could 
be  a  number  of  tracked  Twitter  accounts.  The  tweets  that  are  collected  result  in  the  costs 
of  time  and  effort.  The  data  (i.e.,  messages)  from  these  sources  (i.e.,  Twitter  accounts) 
become  available  sequentially.  It  is  straightforward  to  come  up  with  other  similar  scenarios 
for  which  online  learning  methods  are  germane.  In  summary,  gaining  information  about 
the  quality  of  the  sources  allows  the  analyst  to  select  a  subset  of  them,  given  the  capacity 
and  time  constraints. 


1.2  Research  Questions  and  Methodology 

In  this  study,  we  adopt  a  model  that  allows  us  to  benefit  from  the  ideas  developed  within  the 
online  learning  community.  We  adapt  the  well-known  Thompson  Sampling  (TS)  algorithm 
to  a  case  in  which  many  samples  can  be  explored  at  a  time.  We  address  the  following 
questions  (see  Chapters  3  and  4  for  more  details): 

•  How  can  we  create  a  balance  between  exploration  and  exploitation  to  maximize  the 
benefit  when  making  decisions  as  to  the  collection,  processing,  or  analysis  of  data  if 
there  are  time  or  resource  constraints? 

•  How  can  we  decide  what  amount  of  resources  should  be  allocated  for  collection, 
processing,  or  analyzing  if  we  can  not  change  the  allocation  in  the  short  term? 

•  How  can  we  adjust  the  allocation  dynamically  while  learning  occurs  online? 
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•  How  can  we  quantify  the  risk  of  missing  relevant  data  vs.  the  amount  of  resourees 
alloeated? 

•  How  ean  we  use  the  data  gathered  to  gain  insights  about  the  population  of  the  sourees 
that  generated  the  data? 

Initially,  we  develop  a  modeling  framework  that  is  suitable  to  answer  the  aforementioned 
questions  and  then  analyze  its  performanee  with  numerieal  examples. 

1.3  Scope 

We  intentionally  define  the  model  to  be  analyzed  in  a  generie  fashion.  The  sourees  in  the 
model  eould  be  diserete  portions  of  a  wide  geographieal  area.  Items  from  these  sourees 
are  colleeted  with  a  UAV,  processed  using  speeific  algorithms  at  headquarters,  and  further 
analyzed  by  a  human  analyst.  To  keep  the  model  as  simple  as  possible,  in  this  work,  we  do 
not  consider  speeifie  issues  for  eaeh  setting  (e.g.,  UAV  travel  time  between  non-adjaeent 
geographie  loeations). 

In  Table  1.1,  we  inelude  several  other  illustrative  examples. 


Table  1.1:  Examples  for  Sources  and  Capacities  (Resources)  for  Different  Phases 


Phase 

Source  (Where  to  sample) 

Resource 

Collection 

A  geographical  area, 

A  frequency  band. 

An  edge  of  a  social  network 

A  satellite. 

Signal  interceptor 

Processing 

Data  aggregated  from  the  collection  phase 

Decryption  tool. 
Automatic  translator 

Analysis 

Processed  data. 

Translated,  decrypted  message. 
Restructured  data 

Human  analysts 
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1.4  Structure  of  the  Thesis 

This  thesis  consists  of  five  chapters.  Chapter  2  is  dedicated  to  a  background  on  intelligence 
activities  as  well  as  the  literature  review.  Chapter  3  introduces  the  modeling  framework, 
including  the  assumptions  and  the  proposed  approach.  In  Chapter  4,  we  analyze  the  model 
from  Chapter  3  to  answer  the  research  questions.  Also,  Chapter  4  contains  numerical  results 
obtained  from  the  simulations.  In  Chapter  5,  we  offer  conclusions. 
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CHAPTER  2: 

Background  and  Literature  Review 


In  Chapter  2,  we  provide  information  about  intelligence  and  online  learning.  Then,  we 
discuss  to  what  extent  the  activities  performed  in  the  intelligence  cycle  are  related  to,  and 
can  be  modeled  by,  online  learning.  Finally,  we  look  at  the  related  studies  on  the  overlap. 


2.1  Intelligence 

Intelligence  is  a  means  to  an  end  [4].  From  a  state  perspective,  the  most  important  goal 
of  intelligence  is  to  provide  security  to  the  people.  Almost  all  states  have  dedicated  in¬ 
telligence  agencies.  These  agencies  have  similar  structures  and  procedures.  Their  typical 
missions  are  to  collect  relevant  information  and  to  conduct  objective  analyses.  One  of 
the  key  challenges  is  to  leverage  technological  advances  for  better  performance  in  agency 
missions  [5]. 

2.1.1  Definition  and  Categories  of  Intelligence 

Intelligence  is  an  elusive  term,  so  we  need  to  clarify  its  meaning.  A  broad  definition  may 
be  the  information  that  has  been  collected,  processed,  and  analyzed  for  the  use  of  deci¬ 
sion/policy  makers.  Besides  the  final  product,  the  term  intelligence  is  also  used  to  refer  to 
the  process  through  which  it  is  produced,  the  organization  that  produces  it,  or  the  whole 
Intelligence  Community  [6].  Formal  definitions  of  intelligence  from  the  Department  of 
Defense  Dictionary  of  Military  and  Associated  Terms  [7]  are  as  follows: 


The  product  resulting  from  the  collection,  processing,  integration,  evaluation, 
analysis,  and  interpretation  of  available  information  concerning  foreign  na¬ 
tions,  hostile  or  potentially  hostile  forces  or  elements,  or  areas  of  actual  or 
potential  operations. 

The  activities  that  result  in  the  product. 

The  organizations  engaged  in  such  activities. 
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2.1.2  Intelligence  Cycle 

Although  intelligence  officers  admit  that  effective  intelligence  efforts  are  not  cyclic  [8], 
production  and  consumption  of  the  intelligence  is  traditionally  considered  a  cyclic  process, 
as  shown  in  Figure  2.1.  The  main  steps  in  the  cycle  include  identifying  requirements/needs, 
planning  and  direction,  collection,  processing,  analysis  and  production,  and  dissemination 
[8]. 

The  steps  or  categories  of  the  cycle  represent  the  related  activities  conducted  by  the  agen¬ 
cies.  Activities  in  each  category  may  happen  concurrently,  or  some  steps  may  be  bypassed. 
For  example,  a  requirement  may  be  addressed  by  analyzing  existing  data  without  any  col¬ 
lection  effort. 

Briefly,  the  cycle  can  be  explained  as  follows: 

•  Consumers  determine  the  requirements  and  the  priorities. 

•  Agencies  plan  all  the  efforts  necessary  through  the  process  until  the  delivery  of  the 
final  product  to  the  consumer. 

•  Data  is  collected  via  intelligence  gathering  disciplines. 

•  Huge  amounts  of  data  are  processed  and  converted  into  a  form  usable  by  the  analysts. 

•  Analysts  determine  the  relevancy  and  the  importance  of  the  data. 

•  The  intelligence  is  disseminated  to  the  consumer  who  demanded  it. 
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Figure  2.1:  Intelligence  Cycle. 


Raw  data  is  filtered  out  depending  on  its  relevancy  (importance)  as  it  goes  through  the 
phases  of  the  intelligence  cycle,  and  it  takes  on  different  names  along  the  way:  data,  infor¬ 
mation,  and  intelligence.  The  relationship  between  data,  information,  and  intelligence  [9] 
is  shown  in  Figure  2.2.  The  operational  environment  harbors  all  the  information  we  collect, 
but  we  generally  collect  only  a  fraction  of  it.  Because  we  do  not  know  the  exact  impor¬ 
tance  of  data  before  it  goes  through  the  entire  cycle,  the  decision  regarding  where  to  collect 
the  data  is  itself  a  filtration.  After  the  data  is  collected,  it  is  (pre)processed  and,  therefore, 
subject  to  another  filtration.  Finally,  analysts  examine  the  information  and  possibly  discard 
some  portion  of  the  information  as  irrelevant.  Throughout  the  collection  phase,  it  is  impor¬ 
tant  to  collect  the  least  amount  of  data  needed  so  as  to  increase  efficiency.  This  efficiency 
requirement  drives  our  motivation  to  apply  online  learning  ideas  when  appropriate. 
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Relationship  of  Data,  Information,  and  Intelligence 


Figure  2.2:  Relationship  between  Data,  Information  and  Intelligence.  Adapted  from  [9]. 


2.1.3  Intelligence  Collection 

In  the  collection  phase,  the  data  is  gathered  from  diverse  domains  with  many  techniques 
or  assets,  varying  from  human  to  very  sophisticated  instruments.  These  differ  according  to 
the  following  intelligence-gathering  disciplines: 

•  Human  Intelligence  (HUMINT):  humans  on  the  ground 

•  Geospatial  Intelligence  (GEOINT):  satellite,  aerial  photography,  mapping/terrain 
data 

•  Measurement  and  Signature  Intelligence  (MASINT):  different  types  of  sensors 

•  Open  Source  Intelligence  (OSINT):  from  all  open  sources 

•  Signals  Intelligence  (SIGINT):  intercepting  the  signals 

•  Technical  Intelligence  (TECHINT):  analysis  of  technical  information  of  the  weapons 
and  equipment  used  the  by  foreign  nations 
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•  Cyber  Intelligence/Digital  Network  Intelligence  (CYBINT/DNINT):  cyber  space 

•  Financial  Intelligence  (FININT):  analysis  of  monetary  transactions 

Parallel  to  the  advancement  of  technology,  the  capabilities  of  the  collection  assets,  espe¬ 
cially  technical  ones,  constantly  improve.  However,  the  required  resolution  of  the  informa¬ 
tion  collected  still  prevents  the  agencies  from  collecting  from  all  possible  sources,  e.g.,  ge¬ 
ographical  areas,  spectrum,  Internet  traffic.  As  for  an  intelligence  satellite,  if  its  movement 
is  not  synchronized  with  the  earth’s  movement  (geostationary),  it  can  only  look  at  a  portion 
of  the  area  for  a  limited  time  period.  On  the  other  hand,  collection  or  processing  data  from 
a  source  may  be  costly  due  to  encryption,  deception,  or  other  denial  techniques  [4].  For  ex¬ 
ample,  people  may  exchange  encrypted  messages  within  a  social  communication  network. 
Here,  allocating  collection  efforts  to  the  candidate  sources  has  to  be  done  wisely  to  maxi¬ 
mize  the  value  obtained.  These  typical  situations  pose  an  exploration/exploitation  dilemma 
similar  to  that  of  a  Multi- Armed  Bandit  (MAB)  problem,  as  discussed  in  Section  2.2. 


2.1.4  Intelligence  Processing 

In  the  collection  phase,  a  large  amount  of  data  (especially  from  SIGINT)  is  obtained  that 
requires  processing  before  being  delivered  to  an  analyst.  Processing  may  include  organiz¬ 
ing,  structuring,  or  translating  the  data.  Given  the  constraints  on  the  processing  efforts  and 
the  large  volume  of  data  complete  analysis  may  be  infeasible.  Even  time-critical  data  may 
be  left  untouched  until  a  processing  effort  is  allocated. 


2.1.5  Intelligence  Analysis 

The  flood  of  information  may  keep  analysts  busy  just  reading  the  incoming  information 
without  producing  any  intelligence  [10].  Processed  intelligence  is  first  put  before  the  ana¬ 
lysts.  Daily,  they  look  at  the  information  coming  from  different  sources.  Naturally,  analyt¬ 
ical  capacities  often  lag  behind  the  collection  and  processing  capacities  [4].  Given  the  time 
constraint  and  difficulty  of  analyzing  all  the  information,  the  exploration  and  exploitation 
of  the  relevant  sources  is  also  viable  for  the  analyzing  phase. 
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2.2  Online  Learning 

Section  2.2  defines  online  learning  and  examines  how  it  can  be  applied  to  intelligence. 


2.2.1  Overview 

Although  it  was  first  defined  in  machine-learning  literature,  online  learning  can  be  applied 
to  other  areas  such  as  optimization,  game  theory,  and  statistical  modeling. 

According  to  [11],  “The  goal  of  online  learning  is  to  make  a  sequence  of  accurate  pre¬ 
dictions  given  knowledge  of  the  correct  answer  to  previous  prediction  tasks  and  possibly 
additional  available  information.”  It  is  used  when  learning  with  a  training  data  is  infeasible 
or  the  data  is  non-stationary.  Online  learning  is  also  used  to  adapt  to  the  changes  in  the 
environment.  Learning  as  one  goes  along  can  be  more  robust  than  specifying  a  model  and 
using  mathematical  optimization  [12]. 


2.2.2  Multi-Armed  Bandit  Problem 

The  MAB  problem  is  one  particular  setting  for  online  learning.  In  a  MAB  problem,  an 
agent  chooses  one  of  K  machines  (bandits)  to  play  in  each  game  (iteration).  In  order  to 
maximize  his  gain,  the  agent  has  to  allocate  his  money  wisely  between  exploring  the  good 
bandits  and  exploiting  the  information  learned. 

The  problem  is  important  when  the  decision  maker  has  a  budget  constraint  that  prevents 
him  from  learning  the  truth  about  each  alternative  before  making  a  decision.  Important 
applications  of  the  bandit  model: 

•  Clinical  trials  investigating  the  effects  of  different  experimental  treatments  while 
minimizing  patient  losses  [13] 

•  Adaptive  routing  efforts  for  minimizing  delays  in  a  network  [14] 

•  Financial  portfolio  design  [15] 

•  Resource  allocation  to  various  projects  given  uncertainty  about  the  difficulty  and 
profit  of  each  possibility  [16] 
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2.2.3  Exponential-weight  Algorithm  for  Exploration  and  Exploitation 

Exponential- weight  Algorithm  for  Exploration  and  Exploitation  (Exp3)  [17]  can  be  used  to 
approximately  solve  the  MAB  problem  as  previously  described.  The  performance  of  Exp3 
is  measured  by  weak  regret,  the  difference  between  the  total  rewards  accumulated  by  the 
best  machine  and  the  sum  of  the  obtained  rewards  throughout  the  game  history  [17]. 


Eigure  2.3  shows  the  pseudo-code  of  the  algorithm  [17].  An  obvious  downside  of  the 
algorithm  is  that  the  parameter  y,  which  represents  the  uniformly  allocated  part  of  the 
probability  to  the  machines,  needs  to  be  provided  in  advance.  As  proved  by  Auer  et  al.  [17], 


optimal  7  is  calculated  as  follows:  y=  min 


1  /  KlnK 


Here,  parameter  g  is  the  upper 


bound  of  the  total  weak  regret  after  a  prospected  time  horizon  T  [17].  When  rewards  are  at 


the  interval  [0,1],  maximum  regret  cannot  be  greater  than  T .  Therefore,  we  can  replace  g 


with  T  in  the  equation. 


Parameters:  real  y 

nitialization:  w,(l)  =  1  for  z  =  ..,K 

^or  each  t  =  1,2, .. . 

•  sapiit)  =  (1  -  y)  +  I  for  ^'  =  1.  •  •  • 

•  draw  it  randomly  accordingly  to  the  probabilities  p\{t), . . .  ,pK{t) 

•  receive  reward  Xi,{t)  G  [0, 1] 

•  for  7  =  1 , . . . ,  .fir  set 

Xj{t)  =  if  j  =  it  otherwise  0 

Wj{t  -b  1)  =  Wj{t)exp 

Figure  2.3:  Pseudocode  of  Exp3  Algorithm.  Adapted  from  [17]. 
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In  other  words,  decisions  are  selected  randomly  from  a  probability  mass  function  of  bandits, 
which  is  updated  at  each  iteration  according  to  the  reward  obtained.  One  salient  character¬ 
istic  of  the  algorithm  is  that  a  portion  of  the  probability  is  allocated  uniformly  to  all  of  the 
bandits  so  as  to  keep  exploring  the  bandits. 


2.2.4  Thompson  Sampling 

TS,  as  demonstrated  in  [18],  is  a  heuristic  method  that  can  be  utilized  to  address  the  explo¬ 
ration  and  exploitation  tradeoff  posed  by  MAB.  The  idea  behind  TS  is  to  choose  the  action 
(choice  of  a  machine  to  play)  that  has  the  largest  expected  reward  according  to  the  posterior 
reward  distributions  of  the  actions.  Since  the  expectations  can  not  be  analytically  computed 
and  it  is  easier  to  sample  from  a  posterior  distribution,  actions  are  drawn  randomly  from 
the  corresponding  posterior  distributions  in  each  time  period;  then,  the  belief  distributions 
are  updated  using  past  observations  (Bayesian  update). 

TS  is  described  in  [19]  as  follows: 

Consider  a  set  of  actions  ,  and  rewards  in  M  .  In  each  round,  the  player  chooses  an  action 
aG  £/  and  obtains  a  reward  r  G  M  following  a  distribution  that  depends  on  the  issued  action. 
The  aim  of  the  player  is  to  play  actions  such  as  to  maximize  the  cumulative  rewards. 

The  following  are  elements  of  Thompson  sampling: 

•  A  likelihood  function  P(r 1 0 ,  a) ; 

•  A  set  0  of  parameters  6  of  the  distribution  of  r; 

•  A  prior  distribution  P{0)  on  these  parameters; 

•  Past  observations  S’  —  {{a; r)}; 

•  A  posterior  distribution  P(0 1 ^)  P{S\0)P{0). 

TS  consists  in  playing  the  action  a*  E  sS  according  to  the  probability  that  it  maximizes  the 
expected  reward,  i.e.,  /I[E(r|a,  0)  =  maXfl/E(r|a',  6)]P{6\S)d6.  In  practice,  the  rule  is 
implemented  by  sampling,  in  each  time  period,  a  parameter  6*  from  the  posterior  P{6\S) 
and  choosing  the  action  a*  that  maximizes  E[r|0*,a*],  i.e.,  the  expected  reward  given  the 
parameter  and  the  action  [20].  In  words,  the  player  selects  his  beliefs  randomly  from 
posteriors,  and  then  acts  optimally  according  to  them  [20]. 
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TS  was  not  common  in  literature  until  reeently.  Chapelle  and  Lee  [19]  present  “empiri- 
eal  results  using  TS  on  simulated  and  real  data,  and  show  that  it  is  highly  eompetitive,” 
eompared  to  other  known  algorithms,  e.g..  Upper  Confidenee  Bound  (UCB),  and  robust  to 
observation  delays.  Furthermore,  TS  is  easy  to  implement  and  takes  no  parameter  that  has 
to  be  determined  in  advanee,  unlike  Exp3. 


2.3  Intelligence  and  Online  Learning 

As  diseussed  in  Seetion  1  and  demonstrated  in  Figure  2.2,  the  data  moves  through  until  the 
ultimate  eonsumption  by  the  eonsumer  and  is  subjeet  to  filtration.  Not  all  of  the  eolleeted 
data  will  eventually  beeome  full-fledged  intelligenee.  It  may  be  disearded  as  irrelevant  or 
insignifieant  after  being  eolleeted,  proeessed,  or  analyzed.  It  may  not  even  be  eolleeted  in 
the  first  plaee  beeause  of  its  presenee  in  an  irrelevant  domain. 

The  amount  or  number  of  items  that  are  disearded  as  irrelevant  in  any  phase  of  the  intelli¬ 
genee  eyele  ean  be  an  indieator  of  the  effleieney  of  the  proeess.  In  other  words,  we  need 
to  colleet,  proeess,  and  analyze  the  least  amount  of  data  neeessary  to  produee  a  certain 
intelligenee  without  wasting  any  effort.  This  ean  be  aehieved  by  eolleeting  the  data  from 
the  domain  in  an  adaptive  manner.  The  potential  of  the  domain’s  parts  ean  be  learned  as 
the  data  are  eolleeted  and  fed  into  the  eyele,  whieh  in  turn  ean  redireet  the  eolleetion  asset 
to  more  promising  parts  of  the  domain.  The  same  logie  is  applieable  for  the  proeessing  and 
analyzing  phases. 

Online  learning  prineiples  offer  promising  guidanee  regarding  how  this  filtration  ean  be 
done.  The  inherent  exploration  and  exploitation  dilemmas  that  exist  in  all  eolleetion,  pro- 
eessing,  and  analyzing  efforts  are  eondueive  to  approaehing  the  filtration  as  an  online  learn¬ 
ing  setting  sueh  as  MAB. 

Different  settings  exposed  by  eolleetion,  proeessing,  and  analyzing  efforts,  or  even  by  dif¬ 
ferent  teehniques  employed  in  eaeh  effort,  may  lead  to  different  assumptions  regarding  the 
dependenee  of  the  sourees  (both  temporal  and  spatial).  Optimal  learning  approaeh  ean 
be  modeled  using  modifieations  of  existing  learning  algorithms/heuristies,  sueh  as  Exp3 
and  TS  in  pursuit  of  effieient  solutions  resulting  in  the  maximum  gain  with  the  available 
eolleeting,  proeessing,  or  analyzing  eapaeity. 
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2.4  Related  Studies 

To  the  best  of  our  knowledge,  there  are  three  Naval  Postgraduate  School  (NPS)  master’s 
theses  that  touch  upon  efficient  intelligence  collection  and  processing. 


Costica  [21]  considers  the  bottleneck  or  congestion  caused  by  a  huge  information  flow 
and  proposes  a  tandem  queue  model  for  a  preliminary  classification  of  intelligence  items 
regarding  their  relevance  to  an  intelligence  request. 

Nevo  [22]  specifies  a  network  model  for  social  network  communication  and  treats  the  edges 
as  the  sources.  To  maximize  the  relevant  data  discovered,  he  compares  the  performances 
of  several  learning  algorithms  under  the  time  constraints. 

Ellis  [23]  analyzes  the  performance  of  some  learning  algorithms  in  detecting  relevant  con¬ 
versations  from  an  intercepted  social  communication  network,  as  represented  by  the  model 
developed  by  Nevo. 
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CHAPTER  3: 
Model 


In  this  chapter,  we  set  the  framework  for  our  model  and  parameter  estimation  approach  that 
addresses  the  efficient  data  collection. 


3.1  Setting 

In  order  to  keep  the  developments  as  generic  as  possible,  we  define  several  terms.  As  de¬ 
scribed  in  Chapter  2,  intelligence  cycle  consists  of  the  collection,  processing,  and  analysis 
stages.  For  each  of  these  stages,  items  originate  from  different  sources.  Items  that  are 
examined  require  a  certain  amount  of  resources  (capacity),  depending  on  the  cycle  stage. 

For  instance,  the  source  could  be  a  certain  geographical  area.  Items  from  an  area  are 
collected  with  a  UAV,  processed  using  specific  algorithms  at  headquarters,  and  further  an¬ 
alyzed  by  a  human  analyst.  To  keep  the  model  as  simple  as  possible,  in  this  work,  we  do 
not  consider  specific  issues  for  each  setting  (e.g.,  UAV  travel  time  between  non-adjacent 
geographic  locations).  In  the  Table  3.1,  we  include  several  other  illustrative  examples. 


Table  3.1:  Examples  for  Sources  and  Capacities  (Resources)  for  Different  Phases 


Phase 

Source  (Where  to  sample) 

Resource 

Collection 

A  geographical  area, 

A  frequency  band. 

An  edge  of  a  social  network 

A  satellite. 

Signal  interceptor 

Processing 

Data  aggregated  from  the  collection  phase 

Decryption  tool. 
Automatic  translator 

Analysis 

Processed  data. 

Translated,  decrypted  message. 
Restructured  data 

Human  analysts 
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3.2  Model 


There  are  S  sources  assumed  to  be  independent.  In  each  time  period  t,  source  5  G  { 1 , . . . ,  5} 
generates  a  relevant  item  with  probability  ps,  assumed  constant  in  time,  independently  of 
past  observations.  Only  q  <  S  (but  often  times,  q  S)  sources  can  be  explored  each  time 
period.  For  simplicity,  we  assume  that  there  are  no  misjudgments,  meaning  that  there  are 
no  classification  mistakes  by  the  resource  employed  to  examine  the  item  (see  Table  3.1). 
Exploring  a  relevant  item  yields  a  reward  of  1,  while  non-relevant  items  yield  no  reward,  or 
a  reward  of  0.  We  also  assume  that  the  cost  of  exploring  an  item  from  a  source  is  fixed  and 
equal  for  all  sources  and  is,  thus,  not  explicitly  considered  in  the  model.  The  measure  of 
performance  is  the  total  expected  reward  over  some  finite  horizon  T.  Hence,  if  the  values 
of  Ps  for  all  sources  s  were  known,  then  simply  exploring  the  q  sources  with  the  largest 
probability  Ps  of  yielding  a  relevant  item  would  maximize  the  expected  reward. 

However,  in  most  realistic  situations,  the  values  of  ps  are  unknown  because  there  is  little 
available  information  from  the  source.  In  this  work,  we  assume  that  existing  and  past 
information  can  be  subsumed  into  a  prior  distribution  for  ps  for  each  source  s.  On  the 
one  extreme,  if  the  analyst  has  a  very  high  degree  of  certainty  on  the  value  of  ps  for  some 
source  s,  then  he  would  put  a  prior  density  centered  at  some  value  at  (0,1),  with  most  of 
the  mass  concentrated  around  that  value.  If,  on  the  other  hand,  the  analyst  knows  nothing 
about  Ps,  then  he  would  assume  a  uniform  prior.  In  other  words,  we  model  the  probabilities 
of  yielding  a  relevant  item  for  each  source,  , . . . ,  p5,  as  themselves  being  randomly  drawn 
from  some  distribution  that  may  depend  on  the  source  s. 

The  beta  distribution,  with  Probability  Density  Function  (pdf) 

f{p;as,ps)oc  p^^  {I- pP^)^ 

is  proportional  to  the  likelihood  of  p  given  ct/  successes  and  /3,-  failures.  The  beta  distribu¬ 
tion  is  appealing  because  it  is  conjugate  with  the  Bernoulli  distribution,  which  is  associated 
with  {0, 1}  reward  situations  such  as  ours.  The  beta  distribution  is  continuous  over  0  to  1, 
with  mean  (ct^  -|- /3^)  and  variance  asPs/[i(Xs  +  -1-/3^  -I- 1)] .  Under  the  assumption 

that  Ps  ~  Beta(aj,  jS^),  the  analyst  may  use  historical  data  to  estimate  the  parameters  as  and 
^s-  If  there  is  no  historical  data,  choosing  =  /3j  =  1  is  akin  to  assuming  a  uniform  prior. 
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A  source  that  with  high  certainty  yields  non-relevant  items  would  have  1  while 

one  that  with  low  certainty  generates  relevant  items  would  have  1  >  a,  >  j8,  >  0. 

Source  gets  explored  from 

Bernoulli  (pi), 

and  the  posterior  distribution  for  ps  becomes  Beta(aj+v:j,j8^  +  1  —  a:^),  on  G  {0, 1}. 

Hence,  after  Us^t  explorations  of  items  generated  by  source  s  by  period  t,  we  have 


Ps\^s,li  ■  ■  ■  T^s,ns.t 


~  Beta(ttj. +  y5 f  f 


5 


where  ys^i  =  is  the  number  of  relevant  items  generated  by  source  s  by  time  t. 

As  source  s  is  explored,  the  analyst  gains  a  degree  of  certainty  about  its  probability  of 
generating  relevant  items,  since  its  variance,  equal  to 

(otj  +  ys,t){l^s  +  f^s,!  ~ ys,t)  _  /  N 

(Ofj +  +  +  + 1) 

decays  to  zero  at  a  rate  of  order  one  over  the  number  of  explorations  at  that  source. 

Each  time  period  the  analyst  has  to  decide  which  sources  to  explore,  in  a  way  that  allows 
her  or  him  to  balance  the  sources  with  high  uncertainty  about  ps  (i.e.,  +  Us^t  small) 

against  those  that  are  likely  to  yield  relevant  items  (i.e.,  those  with  +yj  ,f  3>  (3^  +  ris^t  — 
ys,t)-  Intuitively,  the  analyst  should  choose  to  explore  the  items  from  the  q  sources  with  the 
largest  chance  of  generating  relevant  items. 

Recently,  it  has  been  shown  that  an  approach  known  as  Thompson  Sampling  [24]  (referred 
to  as  TS  throughout  this  work)  achieves  the  best  learning  rate  for  this  situation  from  a 
theoretical  standpoint,  as  indicated  by  [25].  Our  model  is  very  similar  to  TS,  with  one 
main  difference:  whereas  in  TS,  one  can  only  sample  a  single  source  at  a  time,  here,  we 
can  sample  q  sources  per  time  period.  TS  examines  items  from  sources  with  the  greatest 
probability  of  generating  a  relevant  item.  This  is  done  in  two  steps,  first,  by  drawing  a 
random  p  from  the  posterior  distribution  of  each  source  and,  second,  by  exploring  items 
with  the  largest  values  of  p  drawn  in  the  first  stage.  We  summarize  these  steps  as  follows: 
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1.  Draw  a  sample  ps  from  each  source  from  a  Beta(a  +ys,t,l^  —  Js,?)-  and  sort 
the  values  in  increasing  order  P{i):P{2)t  ■  ■  :P{s)- 

2.  From  sources  corresponding  to  p^s-q+i):-  ■  draw  a  sample  from  Bernoulli(p^). 


Agrawal  and  Goyal  [24]  have  shown,  for  q=\,  the  expected  regret 

T 

£[Regret(r)]  =  J^£[max{p,} 

t  ^ 

where  p^q^  is  the  source  sampled  at  time  t,  grows  like 

1 


£’[Regret(r)]  =  O 


s 

E 


,i=i 


(max,{p,}-p/) 


A2 


logr 


(3.1) 


where  s*  =  argmax^p^,  a  function  f{T)  =  0(\ogT)  if  |/(r)|  <  klogT  for  all  T  >  0  and 
some  0  <  k  <  o°.  In  words,  the  expected  difference  between  the  total  reward  gained  by 
exploring  the  best  source  if  we  knew  all  the  reward  probabilities  ps  in  advance,  and  the 
total  reward  obtained  by  the  previous  algorithm  (that  is,  the  expected  regret),  grows  at 
order  0(51ogr).  This  implies  that  the  average  regret  goes  to  zero,  meaning  that  learning 
occurs.  Lai  and  Robbins  [25]  show  that  this  learning  rate  is  optimal,  in  the  sense  that  it  is 
not  possible  to  have  the  regret  grow  slower  than  O(logr).  Notably,  the  dominating  term 
driving  the  0{-)  rate  of  growth  in  the  regret  is  one  divided  by  the  square  of  the  smallest 
difference  between  the  best  and  second  best  sources.  This  is  because  the  algorithm  takes  a 
long  time  to  find  the  best  source  when  its  probability  of  yielding  a  relevant  item  is  similar 
to  that  of  the  second  best  source. 


We  view  ^  as  a  decision  variable  for  the  intelligence  organization.  In  this  setting,  the  regret 
at  time  t  is  the  (random)  difference  in  reward  obtained  by  drawing  a  sample  from  each 
of  the  best  q  sources  and  the  reward  attained  by  exploring  sources  pi^t,---,Pq,t,  where 
Pi,tT--  :Pq,i  are  the  probability  of  yielding  a  relevant  item  for  sources  selected  to  explore 
at  time  t. 
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Thus,  the  expected  regret  is 


£[Regret(r)]=££ 


S  q 

s=5— q’+l  i=l 


where p(i)  <  p^2)  <---<P{sy 

Because  the  intelligence  organization  typically  pools  its  resources,  the  value  of  q  (i.e.,  a 
measure  of  the  resources  devoted  to  the  request  for  information  under  consideration)  can 
change  over  time,  but  its  average  value  is  upper  bounded.  This  relaxation  motivates  the 
following  question.  How  should  q  change  over  time  as  subject  to  a  bound  on  its  mean 
value?  What  is  the  associated  risk  with  any  given  ql  How  should  we  adjust  q  in  future  time 
periods?  These  questions  are  the  subject  of  Chapter  4. 


3.3  Parameter  Estimation  after  the  Data  Is  Available 

In  some  cases,  it  makes  sense  for  the  analyst  to  assume  that  the  items  from  a  source  come 
from  a  mixture  of  distributions,  each  distribution  corresponding  to  a  particular  component 
(e.g.,  age  group).  More  precisely,  the  analyst  assumes  that  ps  has  a  mixture  distribution, 

i=\ 

where  n  is  the  number  of  components,  w/  and  /)(■)  are  weight  and  density  functions  of  zth 
component,  respectively,  with  the  constraint  Y^iWi  =  1. 

This  is  often  complicated  by  the  fact  that  the  source-class  may  not  be  observable;  that  is, 
the  analyst  has  a  collection  of  zeros  and  ones  from  a  source,  but  does  not  know  the  com¬ 
ponent  of  each  item.  To  handle  this  scenario,  we  use  the  Expectation  Maximization  (EM) 
algorithm.  EM  can  be  used  to  estimate  the  weights  and  the  parameters  of  the  compound 
distributions  when  the  information  to  which  compound  an  observation  belongs  to  is  miss¬ 
ing  [26].  Employing  EM  we  can  do  the  following: 

•  Estimate  the  sizes  of  each  component  (weights) 

•  Estimate  the  parameters  of  each  component 
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•  Estimate  the  component  of  each  observation 


EM  iteratively  repeats  in  two  steps:  expectation  and  maximization.  In  the  first  step,  con¬ 
ditional  expectation  is  calculated.  In  the  second  step,  the  parameters  that  maximize  the 
expectation  are  found  [27].  It  has  been  proved  that  the  estimates  converge  to  the  true 
parameters  [27].  Eor  this  purpose,  we  use  the  betareg  package  of  R,  which  has  been  devel¬ 
oped  for  beta  mixture  models.  Eor  the  details  of  implementation  for  beta  mixtures  within 
the  package,  see  Grun  et  al.  [28]. 

Eor  a  formal  explanation  of  EM,  let  X  be  the  vector  of  observations  from  the  mixture 
distribution,  and  let  Z  be  the  vector  indicating  the  compounds  that  are  unknown  (hidden). 
9t  is  the  vector  of  parameters  to  be  estimated  in  iteration  t.  Then: 

Expectation  Step:  Determine  the  conditional  expectation  £z|X.9,W.2|e)) 

Maximization  Step:  Eind  the  9  that  maximizes  this  expectation 

In  our  case,  the  densities  /,(■)  are  beta  with  parameters  a,-  and  j8/.  Despite  the  fact  that  we 
show  the  results  for  beta  mixtures,  which  is  more  convenient  for  our  purpose,  EM  for  other 
mixture  models  can  be  implemented,  or  available  packages  can  be  employed. 

Expectation  Step:  Calculate 


—  E{logPi\Xi  —  Xi)  —  ^(^CCqIcI  -|- Xi)  ^(^OCqI^  -f-  j^old  T bl'jj 


Si  =  E {log{l  -  Pi)  \Xi  =  Xi)  =  'E {l5oid  +N- Xi)  -  'E( tto/d  -t-  ^old  +  N) , 
for  i=  and  N  is  the  number  of  trials. 

Karlis  [29]  proposes  a  scheme  to  update  the  current  estimates  of  the  parameters  of  Beta 
distribution  as  follows: 
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Maximization  Step:  Make  an  one-step  ahead  Newton  Raphson  iteration  for  ML  estimation 
of  a  beta  density  using  the  expectations  of  the  E-step.  To  do  so,  calculate 

n 

-  Li  =  1”^/ 

^  = - , 

n 

and,  then,  update  the  estimates  as 


CCnp.w  — 


^{.^old)  ^{,^old  T Pold)  t 

^3iCCold) -^3i<Xold+ Pold) 


o  _  D  ^il^old)  ^ {CCnew  Pold)  ^ 

Pne.  -  Paid  "  +  Paid)  ' 


'E(')  denotes  the  digamma  function: 

d  rTjtl 

'EM  =  ^IniTix))  = 

dx  1  (.rj 


'E3(-)  denotes  the  trigamma  function: 

gamma  function  for  positive  real  numbers: 

poo 

r(.r)  =  / 

Jo 
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CHAPTER  4: 
Analysis 


As  our  main  contribution  in  this  thesis  to  the  ongoing  effort  to  collect,  process,  and  analyze 
data  for  intelligence,  we  extend  the  TS  framework,  so  more  than  one  item  can  be  explored 
each  time  period.  As  shown  in  Chapter  3,  the  capacity  <7  is  a  decision  variable  for  the 
intelligence  organization.  While  there  is  no  variable  cost  for  exploring  a  source,  there  is  a 
limit  to  how  large  q  can  be  in  each  time  period  or  on  average.  The  rationale  for  this  is  that 
the  resources  of  the  intelligence  organization  (e.g.,  technological  or  human)  are  viewed  as 
a  fixed  cost,  to  be  used  at  will. 

Exploring  more  than  one  source  per  time  period  triggers  a  change  in  the  interpretation  of 
expected  regret.  In  our  case,  we  calculate  expected  regret  as  follows: 

E[Regret(r)]  = 

t  5=5— ^+1  i=l 

as  shown  in  Chapter  3.  In  this  setting,  the  regret  at  time  t  is  the  (random)  difference  in 
reward  obtained  by  drawing  a  sample  from  each  of  the  best  q  (unknown)  sources  and  the 
reward  attained  by  exploring  sources  p\,t,---  ,Pq,t,  where  piy, . .  -^Pq,!  are  the  probabilities 
of  yielding  a  relevant  item  for  sources  selected  to  explore  at  time  t. 

In  this  thesis,  we  sample  q  different  sources  each  time  period,  but  there  are  other  possible 
ways  to  sample  q  sources,  such  as  sampling  q  items  from  the  same  source.  However,  the 
operational  settings  in  which  such  an  approach  is  possible  are  limited  and,  thus,  omitted 
from  consideration  here. 

In  this  chapter,  we  explain  the  algorithm  in  detail,  and  we  interpret  the  results  obtained 
from  the  simulations.  First,  in  Section  4.2,  we  look  at  the  performance  of  the  algorithm 
when  the  ps  are  from  either  a  pure  or  mixture  population.  The  goal  in  that  section  is  to  get 
a  feel  for  the  algorithm’s  behavior.  Does  the  expected  regret  grow  logarithmically  in  time? 
How  is  the  expected  regret  affected  by  prior  knowledge? 
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Then,  in  Section  4.3,  we  treat  ^  as  a  decision  variable,  allowing  it  to  change  over  time.  We 
use  a  normal  approximation  to  find  the  capacity  required  to  detect  a  certain  percentage  of 
the  relevant  items.  While  relatively  trivial,  this  is  another  contribution  of  our  analysis. 

In  Section  4.4,  we  develop  the  notion  of  risk.  We  view  risk  as  the  expected  fraction  of  rel¬ 
evant  sources  unexplored  in  a  time  period.  We  analyze  the  trade  off  between  the  resources 
allocated  against  the  risk  assumed. 

In  some  cases,  it  makes  sense  for  the  analyst  to  have  a  prior  for  ps  that  is  a  mixture  dis¬ 
tribution,  for  instance,  when  intelligence  agencies  categorize  population  into  an  innocent 
group  and  a  dangerous  group,  or  when  a  satellite  takes  pictures  in  areas  of  interest  and  past 
non-interest.  Hence,  it  is  important  to  include  such  scenarios  in  our  analysis.  Thusly  mo¬ 
tivated,  in  Section  4.5,  we  employ  the  EM  algorithm  in  situations  wherein  the  intelligence 
analyst  has  historical  data  that  stem  from  a  mixture  distribution,  and  can  be  used  for  a  prior 
for  the  parameters  While  this  section  is  a  bit  disconnected  from  the  other  parts  of 

this  chapter,  we  view  it  as  important,  because  in  most  realistic  situations,  there  is  past  data 
available. 


4.1  Algorithm 

In  this  section  we  discuss  the  algorithms  employed  for  the  analysis.  Because  the  intelli¬ 
gence  organization  typically  pools  its  resources,  the  value  of  q  (i.e.,  a  measure  of  the  re¬ 
sources  devoted  to  the  request  for  information  under  consideration)  can  change  over  time, 
but  its  average  value  is  upper  bounded.  This  relaxation  motivates  the  following  questions. 
How  does  q  change  over  time,  subject  to  a  bound  on  its  mean  value?  What  is  the  associated 
risk  with  any  given  ql  How  should  we  adjust  q  in  future  time  periods?  These  questions  are 
the  subject  of  Chapter  4. 

We  assume  that  the  probability  that  source  s  generates  a  relevant  item  {pd)  comes  from 
some  arbitrary  distribution  with  support  over  [0, 1],  meaning  that  for  source  s  nature  gets 
a  sample  ps,  which  then  is  used  to  generate  the  rewards  from  a  Bernoulli  distribution  with 
parameter  ps.  The  analyst  updates  the  beta  parameters  as  discussed  in  Chapter  3.  From  the 
analyst’s  standpoint,  he  or  she  has  a  prior  distribution  for  ps,  which  does  not  necessarily 
coincide  with  the  true  underlying  distribution  of  ps.  The  analyst  faces  a  number  of  different 
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scenarios,  depending  on  whether  there  is  historical  data,  on  the  assumptions  he  or  she 
makes  about  the  true  distribution  of  ps  and  on  the  type  of  information  revealed: 

1.  There  is  no  prior  information  about  ps,  so  the  analyst  assumes  a  uniform  distribution 
over  (0,1),  i.e.,  Beta(l,l).  The  ps  are  drawn  from  some  arbitrary  distribution,  for  ex¬ 
ample,  a  mixture  of  a  Beta  and  a  triangular  density  over  [0, 1],  as  shown  in  Figure  4.6 
Figure  4.7. 

2.  Nature  sets  the  distribution  of  ps  as  Beta,  and  the  analyst  knows  this.  If  there  is  no 
historical  data,  we  end  as  in  the  first  scenario  (this  is  the  case  of  Figure  4.2).  If  there 
is  historical  data  available,  the  analyst  estimates  and  using  maximum  likelihood 
estimation. 

3.  The  analyst  knows  that  nature  issues  a  mixture  of  Beta  distributions  for  for  in¬ 
stance,  Ps  ~  .9Beta(l,5) -l-.lBeta(5, 1).  The  analyst  observes  the  sequence  of  zeros 
and  ones  from  each  source.  However,  the  analyst  does  not  know  from  which  of  the 
components,  either  Beta(l,5)  or  Beta(5,l)  in  the  preceding  example,  the  data  origi¬ 
nates,  nor  does  he  or  she  know  the  mixing  probabilities  (.1  and  .9  in  the  example). 
In  this  case,  the  analyst  uses  historical  data  along  with  the  EM  algorithm  to  estimate 
the  Beta  parameters  (as  and  that  appear  in  the  algorithm  of  Figure  4.1,  in  l(b)iii) 
as  well  as  the  mixing  probabilities.  This  scenario  is  discussed  in  Section  4.5. 

The  main  algorithm  is  shown  in  Figure  4.1. 
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1.  Initialization: 

(a)  Set  t  =  0, 

(b)  For  each  source  5, 

i.  Draw  ps  from  a  some  arbitrary  density. 

ii.  Set  ys^t  =  0  and  ns^i  =  0. 

iii.  Set  ctj  =  l,/3j  =  1  (or  use  EM  to  estimate  them) 

2.  Draw  a  sample  ps  from  each  source  5  from  a  Beta{as  +  ys,i ^ Ps  +  fis,i  ~ys,t)-  and  sort 
the  values  in  increasing  order  P(i):P{2)t  ■  ■  :P(s)- 

3.  From  q  sources  corresponding  to  ^(5-^+1),  ■  ■  ■  ,P(5),  draw  a  sample  Xs  from 
Bemoulli(pi). 

4.  Set  ys^t  =  ys,i-i  +  1  if  .Ti  =  1  for  sampled  sources. 

5.  Set  Hs.t  =  tis^t-i  +  1  for  all  q  sampled  sources. 

6.  Set  t  =  t  +1. 

7.  Go  back  to  2. 

Figure  4.1:  Pseudocode  of  the  Algorithm  for  Arbitrary  Priors 


4.2  Results 

In  this  section  we  analyze  the  performance  of  the  algorithm  through  numerical  experiments. 
We  consider  two  scenarios.  In  the  first  scenario,  ps  are  sampled  from  one  population  (i’  = 
1, ...  ,5,  and  S  is  the  number  of  sources).  In  the  second  scenario,  ps  are  sampled  from  two 
populations.  We  show  the  results  in  terms  of  cumulative  regret  and  regret  per  time  period. 


4.2.1  When  ps  are  Sampled  from  One  Population 

In  this  subsection,  we  treat  the  simplest  scenario,  one  in  which  all  the  parameters  ps  are 
independent  and  identically  distributed  from  a  Beta  distribution.  In  particular,  we  assume 
that  Ps  ~  Beta(0.02, 0.18).  The  density  is  shown  in  Figure  4.2.  This  assumption  implies  that 
sources  have  mostly  either  low  or  high  ps  values  and  rarely  intermediate  values,  capturing 
situations  wherein  the  population  rarely  produces  relevant  item,  but  those  items  that  are 
relevant  come  from  a  small  subset  of  the  population. 

The  regret  realized  in  each  time  period  is  shown  in  Figure  4.4  and  the  cumulative  regret 
appears  in  Figure  4.3,  with  95%  confidence  interval  bands  obtained  by  200  simulation 
replications  (q  =  20  and  S  =  100).  The  per-period  regret  decays  toward  zero  as  learning 
becomes  realized,  and  the  sources  with  the  largest  ps  become  more  likely  to  be  sampled. 
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Pdf  for  One  Population 


P_s 


Figure  4.2:  Pdf  of  the  Distribution  of  the  ps 


Regret  with  One  Population 


Non-Cumulative  Regret  with  One  Population 


Figure  4.3:  Cumulative  Regret  for  One  Popula¬ 
tion 


Figure  4.4:  Non-cumulative  Regret  for  One  Pop¬ 
ulation 
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In  Figure  4.5,  we  plot  the  average  eumulative  regret  over  200  sample  paths  as  a  funetion 
of  logt  as  well  as  the  95%  eonfidence  interval  bands.  The  motivation  is  Equation  3.1: 
the  expeeted  eumulative  regret  grows  at  order  logt.  The  average  eumulative  regret  does 
not  appear  to  grow  linearly  for  t  =  exp (6)  ~  400,  but  it  does  thereafter.  This  is  not  in 
disagreement  with  Equation  3.1,  as  it  only  applies  as  t  grows  larger.  We  believe  this  is 
because  we  sampled  q  =  20  sources  per  period  while  Equation  3.1  applies  to  the  case  in 
which  q  =  1.  In  other  words,  we  accrue  regret  over  the  poor  sources  that  get  sampled 
among  the  20  selected  sources  per  time  period. 


Regret  vs.  Log(t)  with  One  Population 


Figure  4.5:  Regret  for  One  Population 
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4.2.2  When  ps  are  Sampled  from  Two  Populations 

In  this  subsection,  we  consider  a  mixture  distribution  for  the  priors,  with  known  mixing 
probabilities.  This  will  be  relaxed  later  on,  when  we  use  the  EM  algorithm  to  estimate  the 
mixing  probabilities.  This  would  be  appropriate  for  situations  in  which  a  subpopulation 
group  is  viewed  as  a  source,  with  the  mixing  probability  representing  the  weight  of  the 
subpopulation  in  the  broader  population.  In  particular,  we  assume  that  the  probability  of 
producing  a  relevant  item  {ps)  for  99%  of  the  sources  are  from  Beta(0.05,  0.95)  distribution 
and  for  1%  of  the  sources  from  Triangular(0,l,l)  distribution. 

The  initial  prior  is  Beta(l,l),  but  as  the  number  of  rewards  observed  grows  larger,  its  impact 
becomes  relatively  smaller  and  the  updated  values  of  a  and  j8  (c.f..  Line  2  of  the  algorithm 
in  Figure  4.1)  eventually  force  the  algorithm  to  emphasize  exploring  from  the  best  sources. 

The  densities  of  both  distributions  are  shown  in  Figure  4.6  and  Figure  4.7.  This  assumption 
is  reasonable  for  screening  communication  items  because  most  of  the  people  are  innocent 
and  have  a  low  probability  of  generating  a  relevant  item  (e.g.,  e-mail  or  phone  conversa¬ 
tion),  while  a  very  small  percentage  of  people  (e.g.,  criminals  or  terrorist  suspects)  have  a 
higher  probability  relevant  items.  Regret  in  each  time  period  is  shown  in  Figure  4.9,  and 
cumulative  regret  is  shown  in  Figure  4.8.  As  seen  in  the  figures,  the  mixture  of  source 
populations  has  created  a  similar  regret  pattern  to  the  one  population  scenario. 
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Pdf  1 


Pdf  2 


Figure  4.6:  Pdf  1 


Figure  4.7:  Pdf  2 


Regret  with  TWo  Populations 


Non-Cumulative  Regret  with  TWo  Populations 


T  Time 


Figure  4.8:  Cumulative  Regret  for  Two  (Mix) 
Populations 


Figure  4.9:  Non-cumulative  Regret  for  Two 
(Mix)  Populations 
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Regret  vs.  Log(t)  with  TWo  Populations 


Figure  4.10:  Regret  for  Two  Populations 

As  shown  in  Figure  4.10,  we  plot  the  average  eumulative  regret  in  terms  logt,  ineluding 
95%  eonfidenee  bands,  based  on  200  replieations,  for  q  =  20  sourees  sampled  per  time 
period.  As  with  Figure  4.5,  the  expeeted  regret  appears  to  grow  linearly  only  for  values  of 
t  larger  than  ~  60.  In  this  ease,  we  have  the  extra  differenee,  relative  to  Equation  3.1, 
that  the  prior  distribution  of  the  sourees  is  a  mixture  of  a  beta  and  a  triangular  density  but 
unknown  and  initialized  as  Beta(l,l)  in  the  simulation. 

4.3  Determining  the  Number  of  Sources  to  Sample  {q) 

An  output  of  a  single  replieation  of  the  simulation  for  5  =  100  sourees  during  whieh  the 
analyst  samples  ^  =  10  sourees  per  time  period  is  shown  in  Figure  4.11  {ps  are  sampled 
as  deseribed  in  Subseetion  4.2.2).  We  observe  that  after  an  exploration  period,  the  regret 
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per  time  period  stabilizes  under  the  red  curve,  which  shows  the  real  number  of  relevant 
items  throughout  the  time  periods.  The  blue  line,  represents  the  number  of  relevant  items 
discovered. 

Clearly,  the  larger  the  number  of  sources  explored  per  time  period  means  the  larger  the 
expected  number  of  relevant  items  discovered.  Accordingly,  the  blue  line  approaches  the 
red  line  when  q  is  increased  from  10  (Figure  4. 1 1)  to  20  ( Figure  4. 12).  Another  observation 
is  that  the  increase  in  q  almost  cuts  in  half  the  number  of  time  periods  required  for  the  curve 
to  stabilize  (from  about  100  to  50). 

This  raises  the  question:  How  does  the  expected  regret  change  as  a  function  of  the  number 
of  sources  explored  per  time  period  {q)l  Our  goal  in  this  subsection  is  address  this  issue. 

In  Figure  4. 13,  we  show  the  regrets  obtained  for  different  q  values  (^  =  1 , 10, 20, 40, 60, 80). 
After  the  exploration  phase,  the  regret  grows  like  a  constant  C  times  the  logarithm  of  T , 
wherein  the  proportionality  constant  depends  on  q.  As  shown  in  Equation  3.1,  the  growth’s 
constant  depends  on  how  similar  the  best  source  is  to  the  other  sources  when  q=\.  How¬ 
ever,  when  q  takes  on  other  values,  we  conjecture  that  it  depends  on  how  similar  the  best 
q  sources  are  to  the  rest  of  the  sources.  We  observe  that  the  growth  constant  for  ^  =  1  is 
greater  than  that  for  q  =  10.  The  rationale  for  this  is  that  we  sample  q>  \  sources  per 
period,  so  that  there  is  extra  regret  due  to  the  poor  sources  that  get  sampled  among  the  q 
selected. 
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Time  Time 


Figure  4.11:  ^  =  10 


Figure  4.12:  q  =  20 


Let  Yt  be  the  number  of  relevant  items  in  period  t  across  all  sources.  The  posterior  dis¬ 
tribution  of  Yj  given  the  exploration  to  date  is  a  sum  of  S  independent  Bernoulli  ran¬ 
dom  variables,  Y.s^s,t-  Each  Xs^t  is  Bernoulli  with  parameter  for 

(Xs,t  =  (Xs+ys,t  and  Psq  =  l^s  +  fisq  —ys,t-  The  interpretation,  as  in  Chapter  3,  is  that  as,t  is 
the  prior  parameter  plus  the  number  of  relevant  items  explored  from  source  s  to-date, 
and  is  the  initial  /3j  plus  the  number  of  irrelevant  items  in  source  s  in  the  initial  t  periods. 
Therefore, 


£’[171  exploration  in  periods  t  =  l,...,t  —  1]  =  ^ - ,  (4.1) 

J=1  ^s,t  +  Ps,t 
S 

Var(yf  I  exploration  in  periods  t  =  1, . . .  —  1)  =  ^  Var(Xj^f). 

S=\ 
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Figure  4.13:  Regret  Obtained  by  Different  q  Values 


Since  psq  are  also  random,  we  may  employ  the  total  varianee  formula  for  Var(X^^f), 


Var(X,,,)  =  E[W2s{X,^t\Ps,t)]+y^{E%,t\psA) 
=  E[{Psq){^  -Ps,t)\  +Var(p,,f) 

^S,tPs,t 


=  E[ps,t]-E[pi 


(OCsq  +  l^s,t)^iOCs,t  +  Ps,t  +  1) 

(^s,ii(^s,t  “b  1) 


■  +  ■ 


(the  Ps/s  are  Beta  distributed) 


OCs,t-\-Ps,t  {OCs^t -\- Ps,t){OCs,t  +  Ps,t -\- ^)  {OCs,t  +  Ps,t)^iOCs^t -\- Ps,t -\- ^) 

We  conclude  that 


Var  (y,  I  exploration  in  periods  t  =  l,...,t  —  1) 

_  (^s,t  “F  1)  ^ 

1  ^S,t  +  Ps,t  (OCsq  +  Ps,t){OCs,t  +  Ps,t  + 


(^S,tPs,t 

{CCs,i  +  lis,t)^{CCs,t  +  /3j,r  +  1) 


(4.2) 
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Observe  that  the  posterior  varianee  of  the  total  number  of  relevant  items  by  period  t  de- 
eays  toward  the  varianee  of  a  sum  of  Bernoulli  random  variables,  eonsidered  the  systemie 
varianee,  as  exploration  eliminates  the  varianee  due  to  the  uncertainty  about  the  p^’s. 

The  central  limit  theorem  suggests  that  the  total  number  of  relevant  items  at  time  t  given 
by  Yt  is  approximately  normally  distributed  when  the  number  of  sources  is  large.  This 
motivates  the  study  of  two  different  scenarios.  In  the  first  case,  we  assume  that  the  number 
of  sources  to  sample  has  been  decided  upfront  and  must  remain  constant  thereafter.  In  the 
second  scenario,  the  analyst  can  change  the  number  of  sources  to  sample  dynamically.  In 
both  cases,  the  goal  of  the  analyst  is  to  sample  as  many  sources  as  needed,  so  he  or  she 
has  a  95%  probability  of  capturing  the  reward  of  a  source;  that  is,  on  average,  the  analyst 
collects  95%  of  the  total  rewards. 

Thus,  we  use  a  normal  approximation  to  provide  an  upper  bound  with  c  confidence: 


Upper  Bound  =  £  [Tq]  +  ^  ^  (c)  CTrg ,  (4.3) 

where  £'[To]  is  the  sum  of  all  the  prior  means,  and  Oyq  is  the  sum  of  the  standard  deviation 
of  all  the  priors  at  time  zero. 

Using  these  equations,  we  are  able  to  provide  an  upper  bound  for  Yt  with  confidence  level 
c.  Then,  it  will  be  reasonable  for  q  to  be  greater  than  or  equal  to  this  bound.  Here,  c  can  be 
regarded  as  another  decision  variable. 

In  order  to  capture  all  relevant  items  after  the  exploration  phase,  the  allocated  capacity 
has  to  be  greater  than  or  equal  to  the  number  of  relevant  items  in  each  iteration  {q>  Yt). 
Because  Yt  is  a  random  variable,  it  is  safe  to  choose  a  value  for  q  that  is  greater  than  or 
equal  to  the  value  obtained  by  the  upper  bound  in  Equation  4.3. 

Corresponding  upper  bounds  are  shown  in  Table  4.1.  In  other  words,  a  minimum  num¬ 
ber  of  sources  explored,  q,  for  certain  number  of  sources  S,  with  0.99  confidence  level 
(c).  Although  these  bounds  are  for  prior  Beta(0.02,0.18),  providing  similar  bounds  for  any 
distribution  of  the  ps  seems  straightforward. 
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Table  4.1:  Minimum  q  Values  Required 
S  q 


10 

4 

20 

6 

50 

10 

100 

17 

500 

66 

1000 

123 

10000 

1070 

If  the  analyst  must  choose  the  value  of  q  in  advance,  he  or  she  should  follow  the  recom¬ 
mendation  above.  On  the  other  hand,  since  Us  and  /3j  do  change  over  time  as  relevant  and 
non-relevant  items  are  examined,  the  analyst  may  change  the  value  of  q  dynamically. 

Next,  we  analyze  the  second  scenario,  wherein  the  analyst  can  adjust  the  number  of  sources 
explored  {q)  dynamically  according  to  the  posteriors.  If  the  analyst  has  a  capacity  that  is  at 
least  as  large  as  the  number  of  sources,  then  there  is  no  risk  of  missing  any  relevant  items. 
However,  some  capacity  may  become  idle  or  useless  as  the  sources  are  explored  over  time. 
Hence,  instead  of  allocating  a  constant  capacity  for  all  time  periods,  it  is  more  efficient  to 
adjust  the  capacity  q  as  learning  occurs. 

The  idea  is  similar  to  the  first  scenario,  namely,  to  use  a  normal  approximation  and  to  com¬ 
pute  the  percentile  of  the  number  of  relevant  items  using  the  posterior  mean  and  variances, 
as  in  Equations  4.2  and  4.1.  More  precisely,  we  determine  the  value  of  the  upper  bound: 

Upper  Bound 

=  £’[7;  I  exploration  mt  =  1, ...  —  1]  -l-4>^^(c)A/Var(yf|exploration  mt  =  1, . . .  —  1), 

(4.4) 

where E [7, | exploration  in  periods  t  =  l,...,t  —  1]  and  Var(7f | exploration  int  =  l,...,t  —  1) 
are  as  defined  above.  Notably,  at  time  t,  the  posterior  parameters  for  source  s  are  the  initial 
tts  plus  the  number  of  relevant  items  detected  to  date  and  plus  the  number  of  non-relevant 
items  up  to  time  t. 
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We  summarize  the  algorithm  when  the  eapaeity  q  is  seleeted  dynamieally,  as  shown  in  Fig¬ 
ure  4.16.  We  use  the  posterior  mean  plus  two  standard  deviations  {E\Yt\  +  2  x  ay,)  to  update 
q.  In  simulation,  we  used  Beta(0.02,0.18)  to  sample  Ps-  However,  assuming  no  knowledge, 
we  initialized  the  prior  distributions  for  eaeh  source  as  Beta(l,l).  Output,  for  one  replica¬ 
tion  of  the  simulation,  is  shown  in  Figure  4. 14,  and  the  corresponding  change  in  q  is  shown 
in  Figure  4.15.  The  red  line  represents  the  actual  number  of  relevant  items  generated  by 
all  sources  throughout  the  time  horizon,  and  the  blue  line  represents  the  number  of  relevant 
items  discovered  by  the  algorithm.  It  can  be  observed  that  with  dynamic  q,  the  algorithm 
captures  almost  all  of  the  relevant  items,  even  in  the  exploration  phase  of  the  algorithm.  In 
this  scenario,  the  capacity  q  is  sufficient  to  explore  almost  all  of  the  expected  (with  respect 
to  the  posterior  distribution)  relevant  items. 
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Figure  4.14:  Learning  with  Dynamic  q 
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Figure  4.15:  Change  in  q 

Regarding  the  behavior  of  the  eapaeity  q  in  Figure  4.15,  its  value,  as  determined  by  Equa¬ 
tion  4.4,  deeays  as  the  uneertainty  about  the  values  of  the  ps  probabilities  are  revealed.  As 
mentioned  above,  as  time  inereases  the  posterior  varianee  of  Yt  eonverges  to  the  sum  of  S 
Bernoulli  varianees.  The  eloser  the  initial  prior  is  to  the  true  ps  values  means  the  shorter 
the  time  until  the  eapaeity  q  stabilizes.  We  view  the  uniform  initial  prior  as  a  worst-ease 
seenario  absent  any  “wrong"  knowledge. 
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1.  Initialization: 

(a)  Set  a  and  b. 

(b)  Set  c 

(c)  Set  t  =  0, 

(d)  For  eaeh  souree  5, 

i.  Draw  ps  from  a  Beta(a,  b). 

ii.  Set  ys^t  =  0  and  ris^j  =  0. 

iii.  Set  tts  =  IjPs  —  I  (if  there  are  initial  information  about  any  souree  set 
aeeordingly) 

2.  Draw  a  sample  ps  from  eaeh  souree  5  from  a  Beta(as+ys,i:lis  +  ns,i  —ys,t)-  and  sort 
the  values  in  inereasing  order  P[\),P{2)^  ■  ■  ■  ^P{s)- 

3.  From  q  sourees  eorresponding  to  P[s-q+\)^---^P{S)^  draw  a  sample  from 
Bemouilli(pj). 

4.  Set  ys^t  =  yj,r-i  +  1  if  =  1  for  sampled  sourees. 

5.  Set  Hsj  =  +  1  for  all  q  sampled  sources. 

6.  Calculate  the  Mean  and  Standard  deviation  of  Yt  according  to  the  posterior  dis¬ 
tribution  at  time  t 

1.  Set^  =  £[yr]+d>-l(c)C7y, 

8.  Sett  =  t+  1. 

9.  Go  back  to  2. 

Figure  4.16:  Pseudocode  of  the  Dynamic  Algorithm 


4.4  Risk  vs.  Resource  Allocated 

Having  analyzed  how  the  capacity  q  should  change  over  time,  in  this  subsection,  we  inspect 
the  effect  of  q  on  the  risk  of  missing  relevant  items.  In  order  to  do  so,  we  first  define  a  metric 
to  measure  the  risk.  We  define  the  risk  in  period  t  as  the  conditional  expectation: 


E  [fraction  of  relevant  items  not  explored  in  period  t\Yt\^ 

where  Yt  is  the  total  number  of  relevant  items  in  period  t.  For  example,  if  nature  sets  the 
total  number  of  relevant  items  at  time  t  equal  to  80,  and  the  expected  number  relevant  items 
explored  by  the  algorithm  equals  60  conditioned  on  the  80  relevant  items,  then  the  risk  is 
25%. 

The  expression  above  is  difficult  to  compute  analytically  because  it  depends  on  the  q 
sources  selected  by  the  algorithm  in  period  t,  and  Yt  is  unknown.  This  is  the  reason  for 
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using  Thompson  sampling — it  randomizes  the  selection  of  the  sources  to  sample — by  se¬ 
lecting  the  largest  q  samples,  each  drawn  from  the  posterior  distribution  of  ps- 


The  average  risk  over  the  time  horizon  t  =  1 , . . . ,  T  is  the  grand  average  over  the  time 
horizon  T  of  the  risk  in  each  period: 


Risk(r) 


1 

T 


T 

[fraction  of  relevant  items  not  explored  in  period  t\Yt\. 

t=\ 


The  average  risk  is  random,  because  it  depends  on  the  total  number  of  relevant  items  in 
each  period,  which  are  random  and  not  known.  The  expected  risk  essentially  un¬ 

conditions  the  number  of  relevant  items  in  each  period  and  can  be  estimated  by  Monte 
Carlo  simulation. 


We  provide  a  numerical  example.  For  this  purpose,  after  sampling  ps  once  as  described 
in  Subsection  4.2.2,  we  run  the  algorithm  to  learn  about  p^  30  replications.  The  relation 
between  the  capacity  q  and  the  expected  risk  obtained  is  shown  in  Table  4.2  and  Figure  4.17. 
In  the  latter,  a  95%  confidence  interval  appears  in  light  red,  centered  around  the  sample 
average. 
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Figure  4.17:  Tradeoff  between  Allocated  Resource  per  Time  Period  {q)  and  the  Risk 

In  Figure  4.18,  we  observe  that  the  risk  depends  on  both  the  resouree  alloeated  and  the 
time  horizon.  Although  we  ean  estimate  risk  after  truneating  the  initial  learning  period,  we 
do  not  do  this  to  penalize  long  learning  periods.  If  the  model  is  to  be  used  for  a  long  T , 
then  the  effeet  of  initial  exploration  period  naturally  tends  to  zero.  Otherwise,  short  time 
horizons  foree  the  analyst  to  ehoose  a  greater  q  to  obtain  the  same  risk  level  eomparing  to 
the  longer  T . 
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Risk(Q) 


1.00 


Figure  4.18:  Risk  according  to  q  and  Time  Florizon 


Table  4.2:  Estimated  Risk  for  Allocated  Resources  [q)  (S  =  100  and  T  =  300) 


q 

Risk 

q 

Risk 

q 

Risk 

q 

Risk 

10 

0.33, 

20 

0.137, 

30 

0.06, 

40 

0.034, 

11 

0.299, 

21 

0.128, 

31 

0.06, 

41 

0.032, 

12 

0.269, 

22 

0.118, 

32 

0.054, 

42 

0.03, 

13 

0.239, 

23 

0.109, 

33 

0.051, 

43 

0.029, 

14 

0.223, 

24 

0.101, 

34 

0.048, 

44 

0.027, 

15 

0.201, 

25 

0.096, 

35 

0.047, 

45 

0.025, 

16 

0.186, 

26 

0.087, 

36 

0.044, 

46 

0.024, 

17 

0.174, 

27 

0.08, 

37 

0.04, 

47 

0.024, 

18 

0.164, 

28 

0.076, 

38 

0.039, 

48 

0.022, 

19 

0.153, 

29 

0.072, 

39 

0.034, 

49 

0.021 
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4.5  Using  Posteriors  to  Learn  about  the  Distribution  of  ps 

As  previously  mentioned,  it  is  common  to  be  in  situations  with  historical  data  for  relevant 
and  non-relevant  items,  whereby  each  ps  is  sampled  from  a  mixture  distribution.  The 
issue  for  the  analyst  is  that  the  membership  of  a  ps  to  either  one  of  the  two  components 
is  unobservable,  but  the  zeros  and  ones  can  be  ascribed  to  a  particular  source.  It  is  in 
situations  such  as  these  that  the  EM  algorithm  is  applicable.  In  this  section,  we  provide  two 
numerical  examples  for  the  method.  The  main  idea  is  to  treat  the  means  of  posterior  beta 
distributions  as  if  they  are  samples  from  the  unknown  distribution  of  ps. 

For  simulation,  we  assume  that  p^  come  from  a  mixture  distribution, 

0.1  X  5eta(l,5)  +0.9  x  Beta{5, 1) 
and  generated  ps  for  100  sources.  We  set  q  =  20  and  T  =  2000. 

True  density  and  sampled  ps  are  shown  in  Figure  4.19.  The  fitted  density  and  mean  poste¬ 
riors  are  shown  in  Figure  4.20.  Parameters  estimated  by  the  algorithm  are  shown  in  Table 
4.3.  Since  we  gave  comparatively  small  number  of  sources  (10%  expected  out  of  100)  from 
the  first  component,  Beta(l,5),  it  was  very  hard  to  estimate  its  true  parameters  and  weight. 
However,  its  effect  on  the  fitted  distribution  is  low  compared  to  component  1.  When  we 
compare  Figure  4.19  and  Figure  4.20,  we  can  conclude  that  the  EM  algorithm  generated  a 
density  that  is  similar  to  the  true  one. 

Another  assumption  is  that 

0.25  X  Beta(l,5)  +0.75  x  Beta{5, 1) 

and  generated  ps  for  1000  sources.  We  set  q  =  200.  The  resulting  fitted  densities  of  the 
mixture  population  for  certain  time  points  {T  =  2000, 4000, 6000, 8000)  are  shown  in  Fig¬ 
ure  4.21.  We  observe  that  the  fitted  density  approaches  to  the  true  density  (also  shown  in 
the  figure)  as  data  becomes  available. 
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Figure  4.19:  Assumed  (True)  Distribution  and  Figure  4.20:  Fitted  Distribution  and  Histogram 

Histogram  of  100  Samples  of  100  Means  of  Posteriors 


Table  4.3:  Comparison  of  the  True  Weights  and  Parameters  to  Those  Fitted 


True 

Estimated 

Component  1 

Alpha 

1 

4.9 

Beta 

5 

28.8 

Mean 

0.16 

0.14 

Weight 

0.9 

0.63 

Component  2 

Alpha 

5 

1.65 

Beta 

1 

2.06 

Mean 

0.84 

0.44 

Weight 

0.1 

0.37 
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Figure  4.21:  Fitted  Mixture  Distributions  at  Different  Time  Periods 
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These  two  numerical  illustrations  suggest  that  the  EM  algorithm  is  useful  in  scenarios 
wherein  nature  generates  relevant  and  non-relevant  items  from  sources  that  have  a  param¬ 
eter  ps  that  itself  is  sampled  from  a  mixture  distribution. 
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CHAPTER  5: 

Conclusion  and  Further  Study 


In  this  chapter,  we  summarize  the  conclusions  drawn  from  the  analysis  and  propose  some 
suggestions  and  scenarios  for  future  research. 


5.1  Conclusion 

In  this  thesis,  we  focus  on  the  problem  of  efficiently  processing  the  vast  amount  of  data 
handled  within  the  intelligence  cycle.  We  propose  a  learning  model  that  can  be  used  ef¬ 
ficiently  to  allocate  the  efforts  available  to  the  sources  that  generate  data.  We  suggest  a 
method  that  can  be  used  to  dynamically  adapt  the  amount  of  effort  that  is  allocated  as  data 
becomes  available. 

Our  main  conclusions  and  contributions  can  be  summarized  as  follows: 

•  The  model  described  can  be  used  to  allocate  the  resources/efforts  for  collecting/pro¬ 
cessing  efficiently. 

•  The  suggested  algorithm  employed  in  the  model  yields  a  sublinear  performance  in 
the  simulations  we  conducted,  meaning  that  the  average  regret  tends  to  zero  as  the 
number  of  time  periods  increase. 

•  The  model  performs  well  when  the  ps  are  from  either  a  pure  or  a  mixture  population. 

•  The  model  can  be  adapted  to  situations  in  which  there  exists  prior  knowledge  about 
the  sources. 

•  We  consider  the  number  of  sources  chosen/capacity  as  possibly  changing  over  time  as 
information  becomes  available.  With  this  approach,  intelligence  agencies  can  better 
control  the  regret  in  the  exploration  phase  and  avoid  using  excess  capacity  as  the  ps 
values  are  better  estimated. 

•  The  model  can  also  be  employed  to  gain  insights  about  the  risk,  which  provides 
further  guidance  for  the  capacity  required. 

•  We  also  use  the  EM  algorithm  to  estimate  the  distributional  parameters  for  the  can¬ 
didate  subpopulations  as  the  data  is  collected. 
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5.2  Further  Study 

Further  studies  can  be  conducted  by  relaxing  the  assumptions  and  settings  we  established 
for  our  model  and  methods. 

First,  the  number  of  sources  can  be  permitted  to  change,  as  some  sources  leave  and  new 
ones  come.  As  an  example  of  relaxing  this  assumption,  one  might  consider  translating  a 
number  of  Twitter  messages.  Here,  the  Twitter  accounts  are  the  sources,  and  the  messages 
are  the  items.  Some  of  the  sources  may  be  inactive  for  some  period  of  time;  there  may  also 
be  new  accounts  to  look  into  or  others  that  close. 

Second,  the  ps  probabilities  may  be  permitted  to  change  in  time.  Third,  we  believe  the  most 
challenging  aspect  is  to  capture  the  dependencies  between  the  sources  and  the  item  values 
over  time.  We  intentionally  did  not  specify  what  the  source  and  the  collection/processing 
asset  are.  Focusing  on  a  particular  asset  would  determine  the  setting  and  assumption. 
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